Assessment of lymph node area coverage with total marrow irradiation and implementation of total marrow and lymphoid irradiation using automated deep learning-based segmentation

Background Total marrow irradiation (TMI) and total marrow and lymphoid irradiation (TMLI) have the advantages. However, delineating target lesions according to TMI and TMLI plans is labor-intensive and time-consuming. In addition, although the delineation of target lesions between TMI and TMLI differs, the clinical distinction is not clear, and the lymph node (LN) area coverage during TMI remains uncertain. Accordingly, this study calculates the LN area coverage according to the TMI plan. Further, a deep learning-based model for delineating LN areas is trained and evaluated. Methods Whole-body regional LN areas were manually contoured in patients treated according to a TMI plan. The dose coverage of the delineated LN areas in the TMI plan was estimated. To train the deep learning model for automatic segmentation, additional whole-body computed tomography data were obtained from other patients. The patients and data were divided into training/validation and test groups and models were developed using the “nnU-NET” framework. The trained models were evaluated using Dice similarity coefficient (DSC), precision, recall, and Hausdorff distance 95 (HD95). The time required to contour and trim predicted results manually using the deep learning model was measured and compared. Results The dose coverage for LN areas by TMI plan had V100% (the percentage of volume receiving 100% of the prescribed dose), V95%, and V90% median values of 46.0%, 62.1%, and 73.5%, respectively. The lowest V100% values were identified in the inguinal (14.7%), external iliac (21.8%), and para-aortic (42.8%) LNs. The median values of DSC, precision, recall, and HD95 of the trained model were 0.79, 0.83, 0.76, and 2.63, respectively. The time for manual contouring and simply modified predicted contouring were statistically significantly different. Conclusions The dose coverage in the inguinal, external iliac, and para-aortic LN areas was suboptimal when treatment is administered according to the TMI plan. This research demonstrates that the automatic delineation of LN areas using deep learning can facilitate the implementation of TMLI.


Introduction
Total marrow irradiation (TMI) and total marrow lymphoid irradiation (TMLI) are both treatments for conditioning regimens prior to allogeneic and autologous hematopoietic stem cell transplantation for treating blood cancers (e.g., leukemia, lymphoma, and myeloma [1]).Total body irradiation (TBI) is a conventional treatment for the same purpose [2]; however, it can cause multiple toxicities despite limited prescribed radiation doses.Although it affords certain advantages over a chemotherapy monotherapy regimen [3], TBI usage has declined [4] due to its toxicity and the necessity for specialized facilities.However, advances in radiotherapy (RT) techniques have reduced the exposure of organs at risk (OAR) to radiation [5].For example, in some clinical studies, TMLI has shown promising results, such as lower incidence of radiation-related toxicity and extramedullary relapse [6].This demonstrates the potential for increasing prescription doses to target lesions [7].
Both TMI and TMLI can be used as pretreatments; however, a major difference is observed between them.The former only targets the bone marrow, whereas the latter targets both the bone marrow and major lymph nodes (LNs).Despite these different target lesions, the clinical variation between TMI and TMLI remains unknown; further research is required to compare these therapies.Moreover, planning and conducting clinical studies for TMI and TMLI is challenging due to their low prevalence and the lengthy follow-up required to demonstrate significant differences.Therefore, in such cases, it may be cautiously posited that clinical differences can be indirectly inferred through differences in dose distribution.
And if clinical differences exist, the delineation of the whole-body bone marrow, LN areas, and OARs is essential for TMLI treatment planning.However, this planning process is timeconsuming.One study reports that approximately 12-16 h per patient are necessary [8].Hence, the process is labor-intensive and poses a challenge for clinical adoption Recent advancements in RT have increasingly utilized deep learning technologies.These include applications in generating planning CT data from diagnostic MRIs [9], assisting in target contouring [10], creating treatment plans [11], and predicting radiotherapy outcomes [12].A noteworthy aspect of this evolution is the significant time reduction in target contouring achieved through deep learning, as confirmed in a study [10].This progress suggests a potential decrease in the time required for contouring in TMLI.The "nnU-NET" framework, prominent among deep learning models in these studies, stands out for its excellent performance and user-friendliness, making it accessible even to those without expert-level knowledge in computer programming [13].
In this study, dose coverage is estimated by delineating whole-body LN areas based on the computed tomography (CT) data of patients treated according to the TMI plan.Furthermore, a deep learning model is trained and evaluated for auto-contouring whole-body LN areas using a deep learning framework named "nnU-NET".

Patients and target structure delineation
In this study, the CT data from patients treated with TMI in 2017 and the CT data of patients who underwent whole-body CT scans from January 1, 2021 to December 31, 2021 at our institution were collected.The median age of the patients at the time of TMI was 40 years (interquartile range (IQR): 28-54 years), and 85% were male; 69% were diagnosed with leukemia.The median height, weight, and body mass index (BMI) of the patients who of the TMI patients were 170.8 cm (IQR: 164.8-175.8cm), 69.1 kg (IQR: 63.5-80.3kg), and 23.9 kg/cm 2 (IQR: 23.3-25.7 kg/cm 2 ), respectively.Regarding the prescribed doses for the 13 patients, one received 8 Gy in 4 fractions, another received 12 Gy in 4 fractions, and the remaining 11 patients received 10 Gy in 5 fractions.
To train the deep learning model, data were collected from 13 patients who underwent whole-body CT at our institution between January 1, 2021, and December 31, 2021.The median age of patients at the time of the CT scan was 64 years (IQR: 56-72 years, with 46% being male, and 69% undergoing whole-body CT for multiple myeloma diagnosis.Their median height, weight, and BMI were 162.25 cm (IQR: 150.75-164.5 cm), 61 kg (IQR: 53.5-68.5 kg), and 25.1 kg/cm 2 (IQR: 23.4-25.7 kg/cm 2 ), respectively.All collected CT data were non-contrast enhanced CT scans and the average pixel size was 1.113mm x 1.113mm, and the slice thickness was consistently 5mm.

Dose estimation for LN areas treated by TMI
To calculate the dose received by each LN area using the TMI plan, information from patients who underwent TMI was used.Dose planning from previous TMI treatments was reloaded, and a manually delineated LN area with margins(5mm) for planning target volume was used in each plan to calculate the dose for each area.For each patient, Vx%, which is the volume of the organ receiving x% or more of the prescription dose, was calculated.

Development of deep learning model
To train the deep learning model for medical image segmentation tasks, the "nnU-Net" framework is utilized [13].This framework is characterized by its ability to function without extensive data pre-processing or parameter tuning.This is achieved through an "out-of-the-box" approach that uses a pretrained network and requires only minimal fine-tuning of target data.Hence, the framework is an ideal choice for users who lack expertise in image segmentation and deep learning.The "nnU-Net" framework has been extensively evaluated in a variety of imaging tasks and shown to achieve state-of-the-art performance especially in medical imaging tasks.The patients from whom CT data were collected were divided, with 16 individuals allocated to the training group, 4 to the validation group, and 6 to the test group.Training was performed by fivefold cross-validation using a 4:1 split of CT data from the training/validation group; the model with the best evaluation parameter value was selected.Training and testing were performed using a workstation equipped with an NVIDIA RTX A6000 with 48GB of graphics processing unit memory.
The models developed based on "nnU-NET" in this study are available in the Zenodo repository (doi.org/10.5281/zenodo.7839889).

Model evaluation
The trained models were evaluated using dice similarity coefficient (DSC), precision, recall, and Hausdorff distance 95 (HD95).The DSC is used to quantify the overlap between an observer and consensus contours; it is commonly employed as an index for image segmentation evaluation.The coefficient ranges from 0 (indicating no similarity) to 1 (indicating a perfect match).When the volume of ground truth is "G" and the volume of prediction by a trained deep learning model is "P," the DSC, precision, and recall are calculated as 2×|G\P|/(| G|+|P|), |G\P|/|G|, and |G\P|/|P|, respectively.To measure the distance between two sets, HD95 calculates the farthest point between the two.The HD95 approach is used because it is less sensitive to outliers.It measures the distance between two sets in which one set contains 95% or more of the points found in the other set.The evaluation metrics were calculated by aggregating values from LN areas, including both left/right and superior/inferior divisions.For the patients in the test group, the times necessary for manually delineating whole-body LN areas and accurately trimming the predicted LN areas are measured and then compared.This process is presented in Fig 1.

Ethics statement
This study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. H-2204-050-1314).Personally identifiable information required for the CT data collection process was anonymized after collection.This study was approved by the IRB committee to waive consent because it is a retrospective medical record review study, obtaining participants' consent is not practical at the time of the study, and the risk to participants of waiving consent is extremely low.

Development of deep learning model
Out of a total of 26 patient datasets, 20 patients were assigned to the training/validation group.Within this group, the 20 patient datasets were divided in a 4:1 ratio to facilitate 5-fold crossvalidation.The model that exhibited the highest performance during this cross-validation process was then selected.Following this selection, the best-performing model was further evaluated using the test group.

Discussion
This study aims to calculate the dose coverage of the LN area when treated according to the TMI plan, which has not been previously implemented.Results show that the inguinal, external iliac, and para-aortic LN areas have lower coverage than expected.This is probably due to the anatomical location of the LNs away from the bone.Targeting whole-body bones can be an appropriate pre-transplant conditioning regimen for hematopoietic stem cell transplantation given that most hematopoietic stem cells are present in the bone marrow.However, because malignant cells are also located in major LNs, additional targeting of an LN area may provide better clinical outcomes.Nevertheless, comparative studies have shown that TMI and TMLI have similar clinical outcomes as TBI [6]; however, but no prospective or retrospective studies comparing the clinical outcomes of TMI and TMLI have been conducted.Future studies are necessary to compare the TMI and TMLI plans in terms of the variable dose parameters of OARs and dose coverage.
The second objective of the current study is to develop a deep learning model for the automatic delineation of whole-body LN areas because manual delineation is a time-consuming and labor-intensive process when TMLI is delivered.A model is devised and subsequently trained and tested using the data from the 26 patients treated at the institution based on the "nnU-NET" deep learning framework.Although the amount of training data was small, the model performance and quality of the contours were acceptable for TMI planning.The predicted presacral and para-aortic LN areas are observed to have higher HD95 values than the other LN areas.This may be because these two LN areas have no laterality; consequently, the effect of training using few samples on these areas is less significant than that on LN areas with laterality.Nevertheless, the automatic delineation of whole-body LN areas using the developed model was demonstrated to be significantly more time-effective than manual delineation and had the potential to reduce the time required for total treatment planning.Several papers on auto-contouring models using deep learning have been published, such as a model that auto-contours the target lesions of the head and neck, thorax, rectum, cervix, prostate, and heart structures [10,[20][21][22][23][24][25].In TMLI, target areas include bone and LN areas.For bone, due to the significant density difference with surrounding tissues, generally favorable results are observed [25].Conversely, in LN areas, the density contrast with adjacent tissues is not as pronounced, leading to varied outcomes in different studies.Among these, Rhee et al. [22].employed deep learning to automatically segment the Clinical Target Volume (CTV) in cervical cancer, achieving a mean DSC and HD95 of 0.81 and 2.09, respectively, for nodal CTV, and 0.76 and 2.00 for PAN CTV.Similarly, Cardenas et al. [21].Developed an auto-segmentation model for LN CTV in head and neck cancer patients, yielding a mean DSC between 0.843 and 0.909.Our study demonstrated a median DSC of 0.79 and a median HD95 of 2.63, presenting a competitive performance in comparison with existing studies.
In contrast to these prevalent region-specific studies, whole-body auto-contouring models have received less attention in the current research landscape.Chen et al. [25].Reported a model that can auto-contour whole-body organs.Most models utilize a limited range of CT data for each organ, or the entire body is divided into regions and trained separately to create each model.In contrast, a model based on whole-body CT data is developed in the current study; thus, a prediction for a body LN area can be immediately provided.
One study has shown that increasing the radiation dose in TBI results in better disease control [26].Although disease control was improved, it was not linked to improved overall survival.This might be due to the toxicity of high doses of OARs.Thus, TMI and TMLI have been proposed as alternatives to reduce toxicity [5] despite the absence of published guidelines.Recently, Dei et al. reported the variability in TMLI target delineation [27].The study found that inter-observer and intra-observer variabilities were reduced when guidelines for delineating each regional LN area were provided.Dei et al. provided documented guidelines for the current practice.Additionally, if a deep learning model can be used to provide patient-specific LN area delineation consistently, reduced variability in target delineation may be expected in TMLI planning.
One of the limitations of the current study is the small number of patients.In addition to the data of patients treated with TMI at our institution, whole-body CT data have been used as additional training data.This implies that the patient posture and image resolution between the RT and CT plans differ.Thus, further studies in the future are required for external validation using federated learning with other institutions.Another limitation of this study is the lack of international standards for TMLI target delineation.Even when the LNs are delineated according to the available contouring guidelines for several organs, concerns regarding the LN areas that may be targeted for TMLI persist.Additional fine-tuning of the model is necessary upon the availability of relevant guidelines.

Conclusions
The dose coverage in the inguinal, external iliac, and para-aortic LN areas is suboptimal if a TMI treatment plan is implemented.The current research demonstrates that the automatic delineation of LN areas using deep learning can facilitate the implementation of the TMLI plan.
Fig 3 shows an example of manually delineated LN areas versus the LN areas predicted by the deep learning model in a patient.The blue and red lines denote manually delineated areas and areas predicted using the deep learning model, respectively.Table2shows the mean volumes of manually delineated LN areas and predicted LN areas for patients in the test group by cc.